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MIPS-X  is  a  32b  microprocesspf  with  an  on-chip  16Kb  instruction  cache. 
The  chip  is  implemented  in  a  2//m  drawn  channel  length,  2-layer  metal  CMOS 
technology,  contains  150K  transistors  in  an  8mm  by  8.5mm  die,  and  has  84 
signal  pins  and  24  power  pins.  At  a  peak  operating  frequency  of  20MHz  the 
chip  will  dissipate  less  than  2W. 
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MIPS-X  is  a  32b  microprocessor  with  an  on-chip  16Kb  instruction  cache. 
The  chip  is  implemented  in  a  2/im  drawn  channel  length,  2-layer  metal  CMOS 
technology,  contains  150K  transistors  in  an  8mm  by  8.5mm  die,  and  has  84 
signal  pins  and  24  power  pins.  At  a  peak  operating  frequency  of  20MHz  the 
chip  will  dissipate  less  than  2W. 

MIPS-X  uses  a  very  simple  instruction  format  to  execute  the  common  in¬ 
structions  as  quickly  as  possible.  All  instructions  are  32  bits,  and  use  a  fixed 
format  for  the  register  specifiers.  Like  many  other  RISC  machines  [1,2],  MIPS- 
X  is  a  load/store  machine.  It  avoids  the  high  pin  count  or  fast  bus  cycle  times 
required  to  support  2  word  fetches  per  cycle  (instruction  and  data)  by  using 
a  large  on-chip  instruction  cache.  The  cache  reduces  the  off-chip  instruction 
bandwidth  by  over  a  factor  of  5  and  the  overall  bandwidth  by  a  factor  of  2  to 
2.5. 

The  machine  has  a  5-stage  pipeline:  Instruction  Fetch  (IF),  Register  Fetch 
(RF),  Execute  (ALU),  Memory  access  (MEM),  and  Write  Back  of  registers 
(WB).  During  IF,  the  instruction  address  is  fetched  from  the  on-chip  instruction 
cache.  The  RF  is  used  to  drive  the  register  specifiers  from  the  Instruction 
Register  to  the  register  decoders  and  then  to  do  the  actual  register  fetch.  The 
ALU  cycle  is  used  to  compute  the  effective  memory  address  and  send  it  to  the 
address  pins  for  load/store  instructions,  to  compute  branch  destinations  and 
conditions  for  branch  instructions,  and  to  do  an  ALU  or  shifter  operation  for 
compute  instructions.  The  large  external  cache  is  accessed  during  the  MEM 
cycle.  Finally,  during  WB  a  computed  result  or  fetched  data  word  is  written  to 
the  register  file. 

A  floorplan  and  die  photo  of  the  processor  are  shown  in  Figures  1  and 
2.  The  organization  of  the  instruction  cache  is  unique.  The  16Kb  cache  is 
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divided  into  32  blocks  of  16  words.  The  Tag  unit  contains  the  control  logic  for 
the  cache,  512  valid  bits  and  a  small  CAM  for  the  32  tags.  By  organizing  the 
valid  bit  memory  as  16  32-bit  words,  the  valid  bit  access  and  CAM  compare 
can  occur  simultaneously  allowing  the  processor  to  quickly  determine  whether 
the  cache  hit.  A  cache  miss  causes  the  instruction  and  its  successor  to  be 
fetched  during  the  following  two  cycles.  By  using  trace-driven  simulations,  the 
miss  ratio  has  been  measured  to  be  about  12%  on  average.  With  an  external 
cache  miss  ratio  of  5%  this  yields  a  sustained  throughput  of  12  MIPS  on  large 
benchmarks.  The  Register  File  contains  a  32-word  dual-poi  ted  file,  2  temporary 
and  2  bypass  registers.  The  Execute  unit  contains  a  32b  funnel  shifter,  registers 
for  multiplication/division  support,  a  32b  ALU,  and  the  processor  status  word. 
The  ALU  uses  a  doubly-bypassed  (by  4  and  by  8)  Manchester  carry  chain. 
The  PC  unit  contains  a  branch  displacement  adder,  an  incrementer  and  4  old 
PC  values  used  to  restart  the  machine  following  an  exception.  The  Instruction 
Register  latches  the  cache  output  and  sends  partially  decoded  instructions  to 
the  datapath. 

The  processor  is  controlled  by  two  small  finite  state  machines  shown  in 
Figure  3.  One  deals  with  exceptions  and  nullifying  some  instructions  in  branch 
slots.  The  other  handles  internal  cache  misses.  The  only  other  control  logic  is 
secondary  instruction  decode. 

Features  for  testing  MIPS-X  include  a  pin  that  disables  the  on-chip  cache 
and  a  pin  that  forces  the  chip  into  cache-test  mode.  In  this  state,  the  PC  unit 
generates  sequential  addresses  while  the  data  bus  is  directly  connect  ed  to  the 
on-chip  cache,  allowing  the  cache  to  be  directly  read  and  written.  Another  pin 
forces  test  mode,  which  connects  groups  of  the  control  lines  directly  onto  the 
data  pads  allowing  good  observability  of  the  control  circuitry  with  a  very  small 
amount  of  logic  placed  under  an  existing  bus. 
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(a)  Squash  Finite  State  Machine 


(b)  Cache  Miss  Finite  State  Machine 
Figure  3:  MIPS-X  Finite  State  Machines 
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